-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added LabelBelowBox and AlternativeLabelling settings #41
Conversation
Small contribution LabelBelowBox setting: when True, it will place each text label below the object's box. Default: False. This is because in my case, my cameras usually detect persons tending to be walking on top of the image, and labels where being cut-off, out of frame. AlternativeLabelling setting: When True, and also DrawMode is set to "Match" then the label will show just a reference number, which the will be correlated while text listing each object on Telegram or Pushbullet notification services. Default: False. This is because in my case, when more than one person is detected walking in groups, labels ended on top of each other, illegible.
After some hours of letting it detect objects with my camera, I did some corrections on my two new setting parameters. The alternate labelling settings now produce a mean output, ideal for telegram, where the image is rather uncluttered, and the detected objects list is more concise.
Also, right now I am implementing the following:
... because in my case (I own a DS920+) I understood (please correct me if I am wrong) that once SSS detects movement and the action rule triggers, then it will wait a minimum of 5 seconds to send another trigger (if there is still movement). But AFAIK the deepstackAI takes less than a second for analyzing that image frame. So I think probably the whole "SYNOAI asks for a snapshot, retrieves it and passes into Deepstack, waits for the result and then evaluates it" is not much more than one second in all. If so, then SynoAI could ask for 2 or 3 more frames and probably increase (a lot) the chances of detecting someone bicycling, walking, a motorcycle, or whatever is of interest and takes less than 5 seconds to pass in front of the camera. Then it may be interesting to see if I can grab two or three more snapshots and process them until I find an object. What do you think, @djdd87 ? Regards, Enrique. |
Hey, these are some good ideas thanks. I'll try and get them merged in today at some point (if not this weekend). |
Correct - but this is down to your configuration. So you can change the "Delay" config (see the README) to any value you like. The default is 5000ms, but you can reduce it down.
No, probably not, but it can be depending on hardware, hence the config value.
I think this is basically #35? Have a read of that and let me know if that's what you're after. It'd be good to get that functionality in, I just haven't had chance to test out the impact it'll have on performance. |
Thanks for your comments, @djdd87 ! In particular one thing you say, about delay and default being 5000ms ... this is from SynoAI point of view, but On Synology Surveillance Station, the Action rule afaik is being triggered at 5 seconds as minimum interval of time. I can see motion being triggered on my cam which occurs faster than 5 seconds (a person going on my sidewalk from side to side in fast walking or running). So sometimes (actually most of the time) when SynoAI receives the motion hook from SSS and grabs a frame, the person is partially behind something, and deepstack cannot find it. If I could keep grabbing more frames available right after that first frame I would most probably end up getting a better view of the person, since I may be able to process up to 5 frames inside that "5 second" interval imposed by the "official motion action rule" of 5 seconds. About being basically the same as #35 ... it is just similar, because I would not grab 3 or 4 frames and compare the best %, but rather stop asking for frames and send the notification alert when the first object is detected. If it is in the first frame, then, that is it. If it takes 3 frames, then it will be at the third frame. I am priorizing the "I just seen a person (in my case, the thing I am interested to get)" over "this is the best shot of the person" (which in any case, it is just kind of garbage since it is like 640 x 480, and I can always later inspect the real "high quality stream" (which is at far better res). I think we are talking about the same thing. I just wanted to clarify my idea just in case you can see some fault at my reasoning. |
One thing I don't like about Deepstack is the lack of control we have over it: I.E.: We cannot easily limit the object types it is looking for. I've been playing a bit with INTEL's OpenVino and I had far better responsetime on detecting a smaller set of objects, like limiting it to persons, cars, vehicles, bycycles and motorcycles ... even perhaps cats and dogs ... My testing was done inside inside a virtual machine, running windows 10, running a vs project which invoked the openvino framework, while showing on-screen the frames being processed at FullHD, and I had like 4 or 5 frames per second... (this is a 2012 macbook, and the windows 10 vm was VERY slow). So I am quite sure (far) better speed should be achieved by changing the AI, or perhaps, fine-tuning this deepstack one. On the other hand I suspect that me fiddling in the inside of the Deepstack engine is for now, out of my understanding. "Food for thought" |
Isn't this controlled by the "Interval (seconds) of the action rule?"
I see your point! This would definitely be worth resolving.
Yep - so I think this is basically an extension/configuration option to #35. We can some configs along the lines of: "NumberOfSnapshots": 5,
"SnapshotMode": "Best" // Or "First"
|
I'm definitely up for supporting different AIs. It's definitely something I'd like to expand on anyway and the software was always written with that in-mind. |
@djdd87 : on your comment: "Isn't this controlled by the "Interval (seconds) of the action rule?" Yes, but the fastest it works is each 5 seconds (I cannot type any other number, just select 5, 10 ,15, etc. from a dropdown list) on MAC OS Synology client. |
Also, did you see this ? https://www.youtube.com/watch?v=G_w-ncKqTG4 I am trying to get it running and compare speed against the vanilla deepstack Edit: I got it running against SynoAI (had to change one line in the code which can be "configurable" thru a parameter):
... And then for the deepstack container, First I downloaded the custom dark trained file into a folder in synology. Then on the environment variables, I had to do the following: VISION-DETECTION False After that, on Volumes I had to mount /modelstore/detection into a folder inside my synology, and put in there the trained model dark.pt which can be downloaded from: https://github.com/OlafenwaMoses/DeepStack_ExDark/releases/tag/v1 Finally, I had to edit my appsettings.json for SynoAI, under the camera config, the detection object is "People" instead of "person". The full list of trained objects can be found in: https://github.com/OlafenwaMoses/DeepStack_ExDark/releases/tag/v1 My first impressions: This is far slower than the default VISION-DETECTION included in deepstack ... It takes almost twice the time to process an image. I am trying to fine-tune this thing, so maybe I can get it faster. At b&w night camera images it seems to be twice as good on detecting "People" UPDATE: After one night ... nope. It is not as good as it seemed. Far less detections, and almost useless at daylight. Back to standard deepstack model. At least now I understood that it is possible to change models. It is a matter of finding (or training) a good model tailored for security cameras. |
…ameter Added new config parameter "Path" inside the "AI" section on appsettings.json allowing to change the path used by Synoai to send the snapshot and wait for results from Deepstack, useful if you want to change the included objects detection model into a custom one: Path: Default value is "v1/vision/detection" for standard deepstack trained model. I.E.: If you want to use a third party / custom module, like the DeepStack Dark Scene Objects Detection, you will need this value to be "v1/vision/custom/dark" Added new config parameter "MaxSnapshots" inside the general section on appsettings.json. Default value is 1. Max value is 254. Upon receiving a Motion detect trigger from Synology Surveillance Station, this parameter controls how many snapshots SynoAI will keep retrieving and analyzing until it finds a valid object. Then it stops and returns an alert notification. This greatly enhances the detection capability in certain scenarios (like mine) because: First limitation comes from Synology: The fastest you can trigger a motion event is "one each 5 seconds". BUT ny DS920+ can process a 640x480 snapshot at about 650 ms ... even with Deepstack configured at "Medium" quality! So inside those 5 seconds, I could actually inspect like 7 or even 8 frames for the object I want to be alerted upon ("Person"). In my scenario, some person can walk from side to side of the camera field in LESS than 5 seconds. Usually, motion detection from SSS will detect a person entering the frame from a side where only his / her head is visible. If SynoAI takes that snapshot only and sends it to Deepstack, it will not detect a person. But if I keep retrieving frames inside those 5 seconds, the person will finally appear "whole body" at 2nd or 3rd frame and Deepstack is able to discover that person. So I greatly increased the chances of detecting people by letting SynoAI to take several snapshots once a motion is detected, instead of letting SSS to dictate the timing for each snapshot. Actually I increased SSS triggering event call from 5 seconds into 20 seconds. So now when I get a motion triggering alert from SSS, SynoAI takes control of the situation and starts taking up to 20 snapshots, inside that 20 second windows. Two benefits: 1) It can detect persons which where missed on the earlier scenario of "one snapshot each 5 seconds" 2) If someone is standing in front of the camera, I can get ONE notification each 20 seconds because SSS now triggers the motion event each 20 seconds and SynoAI returns a notification when it first detects a person and STOPS for that run Lastly, I deleted the configuration parameter "DELAY" since actually the delay is given by the triggering event configured in Synology Surveillance Station; being mínimum 5 seconds, so there is no actual need for SynoAI to also take care of Delay.
Thanks for the investigation there @euquiq. I've pulled the request into a development branch. I'd like to do a bit of tidying and then I'll merge it into main. |
Questions:
I've made my tidying up changes inside the development if you want to check it out. |
Probably there is no need for a limit in there ... I think I was wondering about / trying to catch a possible bottleneck, because I did not test exactly what can happen if SynoAI is currently processing some frames due to an earlier motion trigger from SSS and while at it, a new motion trigger is generated from SSS: Would it start a second "thread" and try to make DeepStack process more than one frame ? Or SynoAI is single threaded and it will just drop the earlier processing task and start a new one ? I was afraid about the first scenario, and probably thought that limiting the number of frames in there would keep the bottleneck not getting too expensive. If there is no potential bottleneck, then that limit should be deleted from there: I think it would be nice to be able to put whatever number you want. Say, if your Synology has a really fast processor and your Deepstack container is able to inspect 1 frame each 200ms, then you could limit the triggering calls event inside SSS to happen each 60 seconds as minimum interval -to say something- and (way better chances of detecting something) extend the analysis from SynoAI / Deepstack into 300 frames (covering the whole minute in search for a i.e. Person) |
Hey, it should start a new thread the next time it receives motion now... which might not be desirable. That was a use case of the Delay flag you removed. So if the Delay was set to 5000ms and the process took 4999ms, then there wouldn't be multiple threads running. However, if the process took 6000ms and another event was called by SSS at 5001ms, then a second thread would start running whilst the original process is still running. I think this will now cause an issue and could cause duplicate notifications. It's now possible to have multiple process running. If you set the "MaxSnapshots" to 100 and each snapshot and AI check takes 2 seconds to process and it never finds anything, then it'll be processing for 200 seconds. During that 200 seconds, if SSS detects motion again, it'll then proceed to kick off a new process, which could also run 100 times and take longer (and slow the other thread down). It should be simple enough to keep a flag against each camera to track if it's running or not. That way if SSS triggers the camera again during the 200 seconds, it'll get ignored. Which is fine, as SynoAI is checking anyway. I've also:
|
Not sure I'm following this. Are you suggesting having the option to do away with the SSS triggers and having the ability to just continually poll the camera? If so, this should be relatively straightforward and I'm happy to add that functionality. I'm not sure my NAS would like it though.
|
Well it was because since now the texts for the notifications are crafted earlier in the code and we are storing basically just a string array, I thought that List would simplify and even get speedier results (no Linq) but then my knowledge on c# is blotchy (self-learned) and probably I am wrong. I am reading right now about IEnumerable vs List. |
IList inherits from IEnumerable. I prefer IEnumerable as it's suggestive to C# developers to not go fiddling with the underlying data. |
I am using SynoAI as follows:
So whenever SSS sends a motion, for the next 20 seconds SynoAI will keep asking for snapshots and analyzing them. If the Motion is still occurring after those first 20 seconds, then SSS will trigger another motion event, and SynoAI will again, start grabbing frames and testing for persons for the next 20 seconds ... and so on. Truth is that thanks to this "hybrid" behavior, where SSS is just triggering each 20 seconds into SyonAI which in turn keeps looking into the camera -inside those 20 seconds lapse-, I am able to detect persons that otherwise would be missed. Sure, we could also eliminate SSS motion triggering event and just let SynoAI do the snapshot / analyze / notify if valid object forever. But as you say, this would be a bit harsh on the NAS and only valid for objects that appear and then disappear from the camera. IF you are looking for CARS, and the car is stationary, then this "pure SynoAI snapshot processing mode" Would trigger endless notifications. Hence, Nope it may not be worth it. It should be better to keep using SSS motion triggering as the enabler for SynoAI processing one or more snapshots, inside a small time window (like I am doing 20 seconds, or even more, but finite). |
On the other hand, used "judiciously" the "CONTINUOUS" mode might be ok. Question: While SynoAI is asking SSS for snapshots and then passing those into Deepstack ... there is no "saving into the disks", or am I missing something ? (I recall the disk is not being used for saving anything as the snapshot being stored in memory). I have no idea if Deepstack uses storage space for saving data upon each call, while processing the image. If neither SynoAI nor Deepstack uses the disk for saving ( until a valid object is found) then I could even envision a user placing an SSD disk into one of it's bay as a temporary save storage for snapshots (or even having the whole NAS running on SSD), and benefiting of the "continuous" mode AND the SSD speed (which will probably shave an extra 100 ms when retrieving each snapshot). |
@euquiq I was thinking today; how about instead of the "AlternateLabelling" setting we instead have configs for what to display on the label. e.g. {
"ImageLabel": {
"ShowNumber": true,
"ShowDescription": true
},
"MessageLabel": {
"ShowNumber": true,
"ShowDescription": true
},
} Where:
|
It sounds like an interesting idea. I am wondering, tho, if there are some combinations that would be rather redundant (like Showdescription on both Image and message) but then maybe this is a twist that makes it even more evident, thus more plain in consecuence? :
In any case, it is very desirable if the user is able to "build up" the detail it suits him / her better. I am alswo wondering about this idea: I would very much like to receive high resolution images on my notifications through Telegram, as most of the time it is faster for me to check my telegram app, than to fire up the DS CAM and try to peek whois i.e. ringing my front door. But at this time, SynoAI uses 640 x 480 snapshots, and most of the time it is just not enough resolution for getting much details. So: "HiResNotification": true -> Would ask for a high resolution Snapshot. Scale it down to 640/480 before sending it into DeepStack in this lower resolution. If a valid object is detected, then SynoAI translates its coordinates into the proper position in the high resolution Snapshot boxing the object(s) and even placing the corresponding text / numbering, sending that Hi Res snapsot on the notification. I am betting that the time consumed by all those tasks is not that much time as to turn this idea non-viable. Another alternative: "ReturnHighResObject": true Grab a high res snapshot, scale it down for deepstack, and if an object is found return (with some coords translation) just the object itself cut out from the high res snapshot, so it is in full detail, in high res. If there more than one valid object, then calculate the coordinates encompassing all those objects and return that part of the image. Always cut off from the original highres snapshot. |
I prefer those config names you suggested. I'll get them implemented. In regards to the high res images, why can't you just use the "Quality" config? SynoAI doesn't just use 640x480, it uses whatever stream you tell it to. My images are all at 1080p. |
In my case, when I allow "High" (in my camera, this is 2304 x 1296) on snapshot quality, overall object detection drops into just 1/3 of the usual recognition rate using "Balanced" -640x480) so I thought that Deepstack is getting "dumbfounded" upon receiving such hi-res images (3 megapixels, is a bit more than fullHD) ... even when according to their github code, Deepstack actually is rescaling whatever image you throw at it into different (small) resolutions depending on the Low, Medium of High quality setting it has. Maybe their scaling down algorithm is not that good ? I don't know. Also, as a side-note, while it should not be a huge problem, SynoAI takes like 220 - 250 ms for retrieving a Hi-Res snapshot (which is about 100ms more than the 640x480 "balanced" in my case). On Deepstack Log, it declares "about" the same time for "VISION DETECTION" into each frame, regardless of the snapshot resolution, so it seems that such logged time is (accordingly) depicting "just the AI Job" (after being scaled down) I suppose. I will investigate a bit more, because it should not miss more people. But it is ! |
Just tested it a bit more and I can confirm that when I use "Balanced" (640 x 480) as SynoAI snapshot input, I get far more valid objects (persons) being recognized by Deepstack than when I use "High" (2304 x 1296). |
Another thing I am noticing is that Deepstack container is always showing about 38% cpu usage, wheter it is iddleing or detecting objects. Which is truly baffling. For a start, I would like that container to jump into high cpu usage each time it needs to analyze an image. On the other hand, when iddle, it should be just like SynoAI container: 1% or less CPU usage! |
Hi There @djdd87 !
Maybe you find it interesting to incorporate this small contribution into your code (as a whole, part or it is just useful as an idea for yet another enhancement).
LabelBelowBox setting: when True, it will place each text label below the object's box inside the image. Default: False.
This is because in my case, my cameras usually detects persons which tend to be walking on top of the image, so labels where being cut-off out of frame.
AlternativeLabelling setting: When True, and also DrawMode is set to "Match" then the label will show just a reference number, which the will be correlated while text listing each object on Telegram or Pushbullet notification services. Default: False.
This is because in my case, when more than one person is detected walking in groups, sometimes labels ended on top of each other, illegible.
Also I minimized the text send on each notification, and added a bit more of logic so it doesn't say "object" if I am just searching for "person" in that camera. This helped me to unclutter the telegram screen, while also fitting almost three complete notifications on my telephone screen:
Sorry for such big images. Had no time to scale them down, but wanted to show you the "alternative labelling". If that setting is "false" then everything is shown as before (like your current version).
cheers! Enrique.